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Abstract 



We present an algorithm for building a circuit that approximates single qubit unitaries with precision e using 
0(log(l/e)) Clifford and T gates and employing up to two ancillary qubits. The algorithm for computing our 
approximating circuit requires an average of 0(log 2 (l/e) loglog(l/e)) operations. We prove that the number of 
gates in our circuit saturates the lower bound on the number of gates required in the scenario when a constant 
number of ancillae are supplied, and as such, our circuits are asymptotically optimal. This results in significant 
improvement over the current state of the art for finding an approximation of a unitary, including the Solovay-Kitaev 
algorithm that requires 0(log 3+s (1/e)) gates and does not use ancillae and the phase kickback approach that requires 
0(log 2 (l/e) loglog(l/e)) gates, but uses 0(log 2 (l/e)) ancillae. 

(Nl ; 1 Introduction 

y—i ■ 

^ 1 The efficient approximation of a unitary using a discrete universal gate set is crucial for building a scalable quantum 
• i-h . computing device. Barenco et al. [1 showed that any unitary may be implemented by a circuit with CNOT and single 
1 qubit gates, effectively reducing the problem to that of the single qubit unitary synthesis/approximation. A constructive 
answer to the question of how to approximate a single qubit unitary by a quantum circuit is given by the Solovay-Kitaev 
algorithm [2j [3]. While the Solovay-Kitaev algorithm may be applied to approximating multiple qubit/qudit unitaries 
by quantum circuits, in practice it remains most useful for single qubit approximations. 

Technically, the problem of single qubit circuit synthesis is formulated as follows: given a discrete universal gate set 
or "library", find a sequence of gates in it that approximates a given unitary with precision e. Parameter e determines 
complexity of the resulting approximation. 

Computing an approximation using the standard version of the Solovay-Kitaev algorithm [3] takes 0(log 2 ' 71 (l/e)) 
steps on a classical computer and the number of gates in the resulting quantum circuit is 0(log 3 ' 97 (l/e)). The best 
known upper bound on the circuit size resulting from the application of the Solovay-Kitaev algorithm is 0(log 3+l5 (i/e)), 
where 5 can be chosen arbitrary small [2J. From the other side, Harrow et al. 4 show an f2(log(l/e)) lower bound 
on the number of gates in the approximating circuit. A certain library of quantum gates that allows approximating a 
single qubit unitary to precision e with a circuit containing at most 0(log(l/e)) gates is also reported in pQ. However, 
no efficient algorithm to construct a circuit meeting the lower bound in the number of gates is known. Furthermore, 
the gate set used, I+2l i^2 Y - z } ; j s no ^ considered to be well-suited for a fault-tolerant implementation, in contrast to 
the Clifford and T library. To the best of our knowledge, constructive saturation of the logarithmic lower bound in the 
Clifford and T library has not been shown yet, however, numerical evidence supports the theory that this is indeed the 
case [5] (based on an exponential-time breadth first search algorithm). Our result comes close to exactly meeting the 
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lower bound — our gate count is logarithmic, 0(log(l/e)), however, we use an additional resource in the form of at most 
two qubits initialized to the state |0). 

Allowing additional resources helps to achieve interesting improvements over the Solovay-Kitaev algorithm. For 
example, using a special resource state I7) on 0(log(l/e)) qubits allows to achieve the desired accuracy of approximation 
by a depth 0(log(log(l/e))) circuit containing 0(log(l/e) gates |2], also known as phase kickback algorithm. However, 
the resource state preparation requires 0(log 2 (l/e)) ancillae qubits and a circuit of depth 0(log 2 (log(l/e))) containing 
0(log 2 (l/e) loglog(l/e)) gates. Furthermore, exact preparation of the resource state I7) is not possible using gates 
from the Clifford and T library and qubits initialized to |0) |S]. In comparison, in our work, we employ only two 
ancillae prepared in the simple state |0), and this results in achieving the approximating accuracy of e using a circuit 
with 0(log(l/e)) gates. Later in the paper we show the lower bound of f2(log(l/e)) on the number of gates required to 
approximate a unitary to the accuracy e using a fixed number of ancillae initialized to |0) and any universal gate set. 
One other recent approach uses resource states [6] and probabilistic circuits with classical feedback. The circuit itself, 
excluding state preparation, requires on average a constant number of operations and a constant number of ancilla 
qubits. The method requires precomputed ancillae in the states Rz{2 n (j))H |0) to implement i?z(2 m 0). Our algorithm 
does not rely on the measurements and classical feedback, and our circuit is deterministic. More importantly, our 
algorithm does not employ sophisticated ancilla states that, in turn, may require approximation, as they may not be 
possible to prepare exactly in the Clifford and T library 18] . 

In our previous work \7\, we showed that any single qubit unitary with entries Uij in the ring Z i, A= can be 

synthesized exactly using single qubit Clifford and T gates. Furthermore, we presented an asymptotically optimal 
algorithm for finding a circuit with the minimal number of Hadamard gates and asymptotically minimal total number 
of gates. More precisely, if the square of the norm of an element of the single qubit unitary matrix, , can be 
represented as (a + \* / 2b)/2 n , where a and b are integers such that GCD(a, b) is odd, the total number of gates required 
to synthesize the unitary is in O(n). This work opened the door for bypassing the Solovay-Kitaev algorithm for fast 
circuit approximation of single qubit unitaries by efficiently approximating arbitrary unitaries with unitaries over the 

ring Z i,^ . However, to date, no efficient ring round-off procedure was reported, and it remains an important open 
problem. 

Giles and Selinger |8] recently found an elegant way to prove the conjecture formulated in [7] stating that multiple 



qubit unitaries over the ring Z 



may be synthesized exactly using Clifford and T library. In this paper, we 
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employ some of their results to show that, by adding at most two ancilla qubits, we can achieve asymptotically optimal 
approximation of the single qubit unitaries in the Clifford and T library. 

The significance of the improvement provided by our approach is best seen when, for a fixed precision e, all of the 
approximating circuit parameters such as depth, the number of gates, and ancillae are added into one aggregate figure, 
such as, e.g., the product of the three of these parameters. 



2 Main result 

We focus on the approximation of the following operator: 

A(e^) : a |0) + |1) i-> a |0> + /3e^ |1) . 

We note that any single qubit unitary can be decomposed in terms of a constant number of Hadamard gates and A(e 1 ^) 
(see solution to Problem 8.1 in [5]). Therefore, the ability to approximate A(e 1 *) implies the ability to approximate 
any single qubit unitary. 

There are two main steps in our algorithm: 

1. Find a circuit C consisting of Clifford and T gates such that the result of applying C to 1 00) is close to e** 1 00) . 

2. Apply circuit C controlled on the first qubit to perform a transformation close to: 

a |000) + /3 1 100) 1 — ^ a |000) + fie** 1 100) . 



It can be observed that the net effect of such transformation may be described as the application of A(e^) to the 
first qubit. To accomplish the first step we approximate e 1 ^ 1 00) with a four dimensional vector \v) with entries in the 

ring Z , We then employ an algorithm for multiple qubit exact synthesis to find a circuit C that prepares \v) 

starting from |00) using at most one ancilla qubit. It was shown in [9] that any circuit using Clifford and T gates 
can be transformed into its exact (meaning no further approximation is required) controlled version with only a linear 
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overhead in the number of gates, and using at most one ancilla qubit in the state |0) that is returned unchanged. Our 
analysis shows that, however, on this step we do not need to use this additional ancilla. The resulting total number of 
ancillae is thus at most two. 

2.1 Approximating e 1 ^ 1 00) 

The key is the reduction of the approximation problem to expressing an integer number as a sum of four squares. In 
particular, we are looking for an approximation of: 

e 1 * |00) = (cos (0) + i sin (0), 0, 0, 0) 

by a unit vector: 

\v) := p ( [2 k cos ((f>)\ + i [2 k sin (0)J , 0, a + ib, c + id) , 

where k £ N;a,b,c,d £ Z. Without loss of generality we can assume that < 4> < \. The power k of the denominator 
determines precision of our approximation and complexity of the resulting circuit. As \v) must be a unit vector, the 
remaining four parameters (a, 6, c, and d) should satisfy the integer equation: 

a 2 + b 2 + c 2 + d 2 = 4 k ~ [2 k cos (<f>)\ 2 - [2 k sin (0)J 2 . 

Lagrange's four square theorem states that this equation always has a solution. Furthermore, there exists an efficient 
probabilistic algorithm for finding a solution. For the right hand side M it requires on average 0(log 2 (M) log log M) 
operations with integers smaller than M. It is described in Theorem 2.2 in [TU]. We get a reduction to such a simple 
Diophantine equation at the expense of using two qubits instead of one. 

Furthermore, in estimating the classical complexity of the algorithm for finding the approximating circuit, we will 
rely on an observation that 

4 k - [2 k cos (<f>)\ 2 - [2 k sin (0)J 2 < 4 x 2 k + Const e 0(2 k ). 

2.2 Precision and complexity analysis 

Let us introduce 7 = ( |_2 fe cos (4>)\ +i [2 fc sin(<^))J) /2 k and express \v) as: 

|«)=7|00) + |l)®|fl). 

The application of the circuit C controlled on the first qubit will transform (a |0) + (3 |1)) <£> |00) into: 

a|000)+ / 9 7 |100)+/3|01)® \g) . 
The distance of the result to the desired state a |000) + /3e^ 1 100) is: 

vW^-7)i 2 + i/3i 2 nig)ii 2 . 

By the choice of 7 we have I7 — e^| < therefore the first term in the sum above is in 0(l/2 2fc ). The norm squared 

of \g) equals 1 — \ j\ 2 . The complex number 7 approximates e 4< ^, and the distance of its absolute value to identity can 
be estimated using the triangle inequality: 

||7|- |e # || < |7-e^|. 

Therefore, 1 — \j\ 2 is in (9(l/2 fc ). In summary, the distance to the approximation is in O(l/2 5fc ). 

The same estimate is true if we consider the circuit C as a part of a larger system. In this case we should start 
with the state (a \<po) ® |0) + f3 1 0i) ® |1)) ® |00). Similar analysis shows that the distance to approximation remains 
0(l/2 a5fe ). 

As shown in [8 , it is possible to find a circuit that prepares \v) using 0{k) Clifford and T gates (|5), Lemma 20 
(Column lemma)). The classical complexity of constructing a quantum circuit implementing \v) is in 0{k). In the 
controlled version of this circuit the number of gates remains 0{k) ([5], Theorem 1). In summary, we need 0(log(l/e)) 
gates to achieve precision e. The complexity of the classical algorithm for constructing the entire approximating circuit 
is thus dominated by complexity of finding a solution to the Diophantine equation, which is in 0(log 2 (l/e) loglog(l/e)). 
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2.3 How many ancillae are needed? 

A straightforward calculation shows that the number of ancillae used is three. However, we can get around using only 
two ancillae. To understand how, we need to go into the details of the proof of Lemma 20 (Column lemma) from |S]. It 
shows how to find a sequence of two-level unitaries of type iX, T~ m (tH)T m , and W [8] and length O(k) that allows to 
prepare a state with the denominator 2 k . A controlled version of the two level unitary is again a two level unitary. In [8], 
Lemma 24, it was also shown that any such unitary required can be implemented using no extra ancillae. Therefore, 
the controlled version of the circuit C will not use any additional ancilla and we need only two of them in total. 

3 Lower bound on the number of gates when ancillae are allowed 

Lemma 1. Let G be a universal library, and let My be a set of unitaries, that simulate a unitary V acting on n qubits, 
using m ancillary qubits: 

My = {U G U (2 m+n ) \U (|0) ® |0)) = |0) ® (V |0))} . 

Then, for any e there always exists a unitary V (e) such that the number of gates from G needed to construct a unitary 
within the distance e to Myr s -\ is in 0(log(l/e)). 

We use the volume argument similar to the one presented in [4]. 

Let N = 2™, p be the distance induced by Frobenius norm and p be the Haar measure on U (N). For the unitary 
U we define the volume of its e-neighbourhood as: 

v (U, e) = (i {V G U (AO \p (M v , U)<e}. 

Let G k be the set of all unitaries that can be constructed using k gates from the library G. Suppose that for any unitary 
V we can find a unitary U from G k within the distance e from My. This implies: 

p(V(N))< ^2 v(U,e) <\G\ k maxv(U,e). 

U£G k ' 

We will show that the volume v (U,e) is upper bounded by CqS n2 , for some constant Co, therefore: 

We next show how to estimate v (U, e). Let Uo be a submatrix of U defined as follows: 

U :={<ei|®(0|)l7(|0)® |e 3 -)} 

where {|ej)} is the standard (computational) basis in C(N). Taking into account that the distance p is induced by 
Frobenius norm, we write p (U, My) > p (Uo, V). Therefore: 

v (U, e)=p {V\p (My, C/)<e}</i {V\p (Uo, V) < e} . 

Let us define V m in to be a unitary closest to Uo- To estimate v (U, e) it suffices to consider the case when p (V m i n , Uo) < e. 
The distance p is unitarily invariant, therefore p (v^ in Uo, I^j < £ and 

{V\ P (U , V)<e} = {V\ P (vi m U , V)<s}. 

From the triangle inequality, 

P (I, V) < p (vl m U , l)+p (v r { m Uo, V) , 

we conclude that 

{y\p (vl in U , v) < e} C {V\p(I, V) < 2e} . 

Finally, 

v(U,e) < p{V\p(I,V) < 2e}. 

As shown in 0], there exists a constant Co such that the volume of the ball {V\p(I, V) < 2e} is less than CoS n2 . 

Estimate ^ on k shows that we need circuits of the size at least f2(log(l/e)) to cover the full group U (N). If k is 
chosen in such a way that the inequality (JXJ) does not hold, due to the volume argument, there exists a unitary V (e) 
such that it is not possible to approximate any unitary from My(e) with precision e using at most k gates. 
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4 Future work 



There are some interesting questions that remain to be answered. The first one concerns the practicality of the proposed 
construction. In particular, what are the constants hidden behind the big-O notation in our approach, and can they be 
optimized (while further optimizations are only possible up to a multiplicative factor they are, nevertheless, important 
for practical purposes)? The original algorithm proposed in [8] uses a decomposition into single and two level unitaries. 
Each single and two level unitary may have a relatively large (yet, resulting in a blow up by at most a constant 
factor, [9]) implementation cost. An example is given by the CNOT gate, whose controlled version, the Toffoli gate, 
requires a strictly positive number of T gates, whereas none are needed for constructing the CNOT itself. Furthermore, 
T gate is known to be more difficult to implement fault tolerantly compared to any of the Clifford gates. Next, what 
are the possible trade-offs between adding/reducing ancillae and the gate count? Is it possible to use other efficiently 
solvable Diophantine equations to discover approximations of other types of gates? Lastly, does there exist an efficient 
algorithm to round off single-qubit unitaries to those single-qubit unitaries over the ring Z 
for ancillary qubits altogether? 
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